3PFDB+ - Database of Best represesentative PSSM Profiles of Protein Families:
Sensitive sequence search techniques play a pivotal role in the post genome-era. The huge volume of sequence data generated using high-throughput sequencing experiments need to be rapidly and effectively annotated using sensitive sequence search methods as a pilot step to understand the biological implications of individual sequences. Due to the practical inability of functional validation of individual sequences from available genome projects, Bioinformatics tools are widely using to enhance the function annotation of sequence data based on robust sequence annotation programs. BLAST suite of programs is the first choice for such annotation of individual protein sequences. Position Specific Iterative BLAST (PSI- BLAST) is one of the best flavours among the BLAST programs that offers a much sensitive sequence search methods using Position Specific Scoring Matrices(PSSM). PSI-BLAST can be effectively used to measure residue conservation in set of sequences. PSSMs can be created using PSI-BLAST, which finds similar protein sequences to a query sequence, and then constructs a PSSM from the resulting alignment. PSI-BLAST can save the PSSM (Position Specific Score Matrix) constructed through iterations.
3PFDB provides the best representative sequences and profiles for each Pfam family identified using FASSM and HMMER which employs motif and global sequence based homology detection strategy. FASSM (Function Association using Sequence & Structure Motifs) algorithm can be used to assess the ability of individual sequence in a given sequence family to generate PSSM profiles that efficiently recognize other members in the family. The method is especially useful to detect difficult relationships across protein famlies. It has been shown that FASSM can be used to assign function to sequence that belong to difficult categories such as discontinuous domains, small domains and circular-permutations in domains.
3PFDB+ - Methodology: To extract the set of possible representatives, all the family members were clustered at a sequence identity threshold of 25%. Profiles corresponding to the representative sequences were generated by performing PSI-BLAST searches against a non-redundant PFAM family dataset gathered at 50% identity cut-off. Each of these profiles were then assessed for the family coverage using HMMER and FASSM. The efficiency in identifying other members from the same PFAM family was computed as the family coverage.
3PFDB+ - Database Statistics:
Number of Pfam Families in the current release of Pfam (Pfam 26.0) : 13673 families
Number of Pfam Families with representative in 3PFDB+ : 13667 families
Number of PFam families without representative PSSMs in 3PFDB+ : 6 families
3PFDB+ - Database Features:
Best represesentative PSSM profile
FASSM based Coverage Analysis Results
PSIMOT-Motifs extracted using PSIMOT routine of FASSM
Sequence based PCA plot of the Protein Famly
Alignment of Protein Family
Download PSSM, HMM Model and alignment
Details about PFam Families
3PFDB+ - References:
K Gaurav, N Gupta and R Sowdhamini., (2005)FASSM: Enhanced Function Association in whole genome analysis using Sequence and Structural Motifs. In Silico Biology, 5, 0040
R.D. Finn et. al., (2008) The Pfam protein families database Nucleic Acids Res., 36, D281-D288
Altschul S.F., et. al., (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.
Eddy, S. (1998) Profile hidden markov models. Bioinformatics, 14, 755–763.
3PFDB+ - Team : Prof. R. Sowdhamini (Contact : mini@ncbs.res.in)
Agnel P. Joseph, Prashant Shingate and Atul K. Upadhyay